Indices of  Microbial Diversity Relating To Health

ABSTRACT

Provided herein are methods for altering a health state of a subject by administering a wellness intervention to subjects found to have quantitative measures of microbial genera in the subject&#39;s microbiome that are associated with undesirable health states. Undesirable health states can be inferred by executing models that predict health states based on the quantitative measures, such as relative amounts of selected microbial genera to all microbes. Models are created by statistical methods that analyze datasets that include, for each of a plurality of subjects, verified health states and quantitative measures of each of a plurality of microbes classified at designated taxonomic levels, e.g., genus level.

REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of the priority date of U.S.Provisional application 62/809,873, filed Feb. 25, 2019, the contents ofwhich are incorporated herein in their entirety by reference.

BACKGROUND

A goal of understanding microbiome studies is to reveal mechanisms ofhost genetic and environmental factors that shape the human microbiome.Insights gained from these types of studies may contribute to thedevelopment of improved health outcomes, including therapeuticstrategies for modulating the microbiome composition in human health anddisease states. However, despite significant public interest in thisarea of scientific investigation, conclusive data linking detection andcausality remain elusive. Consequently, microbiome health claims studyis fraught with correlative claims that are difficult to substantiate orlink to a mechanism.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and form a partof the specification, illustrate exemplary embodiments and, togetherwith the description, further serve to enable a person skilled in thepertinent art to make and use these embodiments and others that will beapparent to those skilled in the art. The invention will be moreparticularly described in conjunction with the following drawingswherein:

FIG. 1 shows cumulative percentage of samples for which a given numberof genera account for 50%, 75% and 90%, respectively of the totalmicrobial biomass. The results show that in 97.8% of samples, 50% oftotal microbial biomass is accounted for by the top 5 genera. In 99.7%of samples, 50% of total microbial biomass is accounted for by the top10 genera. In over 95% of samples, 75% of total microbial biomass isaccounted for by the top 11 genera. In over 90% of samples, 90% of totalmicrobial biomass is accounted for by the top 18 genera.

SUMMARY

A statistically significant correlation between quantities of fecalmicrobiome at the genus level and health conditions such as diabetes,body mass index, and dysbiosis, has been found. It has been found thatmicrobial diversity is captured by relatively few genera. This concepthas been used to aggregate analyses and identify major trends. Thisability can be used to design genera taxonomy and/or specific 16Sspecific primer sets targeting microorganisms associated with thesediseases for use in nucleic acid amplification and detection strategiespermitting detection of specific high human health value targetsassociated with the above diseases. This further provides generic and/oractionable advice guidance to test result recipients. These resultssuggest that health practices that may ameliorate the health status ofsubjects either via exercise, diet or healthy eating practices may bevaluable to pursue.

Provided herein are methods of developing mathematical models forinferring the status of a health condition in a subject. The methodsinvolve providing a training dataset that includes data on a pluralityof subjects. The data for each subject includes: (1) status of thehealth condition as verified by a medical practitioner, and (2)quantitative measures of each of plurality of gut microorganisms forwhich genus is provided or can be determined. The training data set isused to train the learning algorithm to generate a model that infers thestate of an individual based on quantities of microorganisms the genuslevel.

Such models can be used in the methods of predicting or inferring ahealth state of an individual. Such methods involve providing from theindividual a gut, vaginal canal, endometrial cavity, saliva, cell boundor cell containing or other biological fluid derived microbiome sample,for example, a fecal sample. From the sample is determined relativeamounts of microbes from specified genera to total microbes. Thespecified genera are those used by the model to infer the health statein question. Quantitative measures of the specified genera can bedetermined, for example, by selective amplification of 16S RNAcharacteristic of each genus in the group of genera. For example, thespecified genera can be quantified using qPCR or sequencingmethodologies.

In one aspect provided herein is a method comprising: a) predicting astate of a health condition in a subject by: i) providing a microbiomesample (e.g., a gut , vaginal canal, endometrial cavity, saliva, cellbound or cell containing or other biological fluid derived microbiomesample) from a subject; ii) determining, from the sample, a quantitativemeasure of microbes in one or a plurality of selected microbial genera;and iii) applying a model to the quantitative measure to predict a stateof the health condition; and b) recommending or administering to thesubject an intervention selected from a food, a supplement, a probiotic,a lifestyle change or a pharmaceutical drug in an amount or degreesufficient to alter or maintain the state of the health condition. Inone embodiment the state of the health condition is given as a score,and altering the state of the health conditions comprises altering thescore. In another embodiment the method further comprises monitoring thestate of the health condition in the subject over a period of time byrepeating operations (i), (ii) and (ii) one or a plurality of times. Inanother embodiment the health condition is overweight Body Mass Indexwith Type II diabetes, and the selected microbial genus is taxonomicgrouping “929”. In another embodiment the quantitative measure is arelative amount of microbes belonging to taxonomic grouping “929” tototal microbes. In another embodiment the model predicts that overweightBody Mass Index with Type II diabetes is present when the relativeamount is above a threshold percent. In another embodiment the healthcondition is normal BMI with ulcerative colitis, and the selectedmicrobial genus is taxonomic grouping “327”. In another embodiment thequantitative measure is a relative amount of microbes belonging totaxonomic grouping “327” to total microbes. In another embodiment themodel predicts that normal BMI with ulcerative colitis is present whenthe relative amount is above a threshold percent. the health conditionis overweight Body Mass Index with Type II diabetes, and the selectedmicrobial genus is taxonomic grouping “878”. the quantitative measure isa relative amount of microbes belonging to taxonomic grouping “878” tototal microbes.

In another aspect provided herein is a method comprising: a) accessingby computer, a dataset comprising, for each of a plurality of subjects,(1) a medically verified state of one or a plurality of healthconditions, and (2) quantitative measures of amounts of microbesbelonging to one or a plurality of selected microbial genera in amicrobiome of the subject; and b) performing statistical analysis on thedataset to develop a model that predicts the state of the healthcondition in a subject based on the quantitative measures. In oneembodiment the quantitative measure comprises relative amount ofmicrobes belonging to the genus, e.g., compared to all microbes. Inanother embodiment the model predicts the state of the health conditionbased on the relative amount of a single microbial genus compared withtotal amount of microbes. In another embodiment the quantitative measurecomprises an absolute amount of microbes belonging to the genus. Inanother embodiment the microbiome is a gut or other sample sitemicrobiome. In another embodiment the measures are determined from afeces sample from the subject. In another embodiment the plurality ofsubjects is at least any of 1000, 2000, 4000, 8000, 14,000 or 30,000subjects. In another embodiment the subjects are animals, e.g., fish,avians, amphibians, reptiles, or mammals, e.g., humans. In anotherembodiment the dataset is provided by curating the American Gut Projectdatabase. In another embodiment the dataset is provided by: i) accessinga database comprising quantitative nucleic acid sequence informationfrom microbes in a microbiome, e.g. a gut, vaginal canal, endometrialcavity, saliva, cell bound or cell containing or other biological fluidderived microbiome; and ii) assigning microbial taxonomic groupidentities at the genus level for microbes in the microbiome based onthe nucleic acid sequence information. In another embodiment thequantitative nucleic acid sequence information comprises quantitativemeasures of 16S RNA, and the microbial taxonomic group identities areassigned, at least in part, based on 16S RNA sequences. In anotherembodiment microbial taxonomic group identities are based on the NCBItaxonomy database, the Disbiome database (disbiome.ugent.be), theMicrobiome Database (microbiomedb.org/mbio/app/) or the IntegratedMicrobial Genomes and Microbiomes database (img.jgi.doe.gov/). Inanother embodiment the health conditions are selected from underweightBody Mass Index (“BMI”), overweight Body Mass Index, normal Body MassIndex, obese Body Mass Index, normal Body Mass Index with Type IIdiabetes, overweight Body Mass Index with Type II diabetes, obese BodyMass Index with Type II diabetes, normal Body Mass Index with Type Idiabetes, normal Body Mass Index with ulcerative colitis, and unknownBody Mass Index with otherwise presumed healthy. In another embodimentthe state of the health condition is characterized as a binary (e.g.,presence or absence), a probability (e.g., chance of presence orabsence), a number (e.g., a measurement or a score on a scale) or arange (e.g., high, medium or low). In another embodiment the statisticalanalysis comprises a recursive statistical analysis. In anotherembodiment the statistical comprises: correlational, Pearsoncorrelation, Spearman correlation, chi-square, comparison of means(e.g., paired T-test, independent T-test, ANOVA) regression analysis(e.g., simple regression, multiple regression, linear regression,non-linear regression, logistic regression, polynomial regression.stepwise regression, ridge regression, lasso regression, elasticnetregression) or non-parametric analysis (e.g., Wilcoxon rank-sum test,Wilcoxon sign-rank test, sign test). In another embodiment thestatistical analysis is performed by a machine learning algorithm. Inanother embodiment the machine learning algorithm is a supervised orunsupervised machine learning algorithm, e.g., selected from artificialneural networks (e.g., back propagation networks), discriminant analyses(e.g., Bayesian classifier or Fischer analysis), support vectormachines, decision trees (e.g., recursive partitioning processes such asCART—classification and regression trees), random forests), linearclassifiers (e.g., multiple linear regression (MLR) or graphical anova),partial least squares (PLS) regression and principal componentsregression (PCR)), hierarchical clustering and cluster analysis. Inanother embodiment the model calculates a numerical score that reflectsthe relative amount of a set of one or more selected microbial genera tothe total amount of microbes. In another embodiment the model predictsthat the health condition is present if a number representing a relativeamount of the one or more microbial genera is above a cutoff value. Inanother embodiment the model predicts that the health condition is notpresent if a number representing a relative amount of microbes belongingto a selected taxonomic genus is below a cutoff value. In anotherembodiment the cutoff value is set to produce a sensitivity,selectivity, positive predictive value or negative predictive value ofat least any of 75%, 80%, 85%, 90%, 95%, 98% or 99%. In anotherembodiment the state of the health condition is represented by a measureof confidence of the prediction.

In another aspect provided herein is a method of predicting a state of ahealth condition in a subject comprising: a) providing a microbiomesample (e.g., a gut, vaginal canal, endometrial cavity, saliva, cellbound or cell containing or other biological fluid derived microbiomesample) from a subject; b) determining, from the sample, a quantitativemeasure of microbes in one or a plurality of selected microbial genera;and c) applying a model to the quantitative measure to predict a stateof the health condition. In one embodiment the state of the healthcondition is characterized as a binary (e.g., presence or absence), aprobability, a number (e.g., a measurement or a score on a scale) or arange (e.g., high, medium or low). In another embodiment the modelpredicts that the health condition is present if a number representing arelative amount of microbes belonging to a selected taxonomic genus isabove a cutoff value. In another embodiment the cutoff value is set toproduce a sensitivity, selectivity, positive predictive value ornegative predictive value of at least any of 75%, 80%, 85%, 90%, 95%,98% or 99%. In another embodiment the model predicts that the healthcondition is not present if a number representing a relative amount ofmicrobes belonging to a selected taxonomic genus is below a cutoffvalue. In another embodiment the cutoff value is set to produce asensitivity, selectivity, positive predictive value or negativepredictive value of at least any of 75%, 80%, 85%, 90%, 95%, 98% or 99%.In another embodiment the quantitative measure is a relative amount ofthe selected microbial genera to all microbes in the sample. In anotherembodiment the model predicts state of the health condition based on acut-off value, e.g., wherein a relative amount above the cut-off valuepredicts that the health state is present and a relative amount belowthe cut-off value predicts that the health state is not present. Inanother embodiment the health condition is overweight Body Mass Indexwith Type II diabetes, and the selected microbial genus is taxonomicgrouping “929”. In another embodiment the quantitative measure is arelative amount of microbes belonging to taxonomic grouping “929” tototal microbes. In another embodiment the model predicts that overweightBody Mass Index with Type II diabetes is present when the relativeamount is above a threshold percent. In another embodiment the healthcondition is normal BMI with ulcerative colitis, and the selectedmicrobial genus is taxonomic grouping “327”. In another embodiment thequantitative measure is a relative amount of microbes belonging totaxonomic grouping “327” to total microbes. In another embodiment themodel predicts that normal BMI with ulcerative colitis is present whenthe relative amount is above a threshold percent. In another embodimentthe health condition is overweight Body Mass Index with Type IIdiabetes, and the selected microbial genus is taxonomic grouping “878”.In another embodiment the quantitative measure is a relative amount ofmicrobes belonging to taxonomic grouping “878” to total microbes. Inanother embodiment the sample is selected from human excreta (e.g.,feces (stool) or urine), blood, serum, plasma, saliva, or throat swab.In another embodiment determining the relative amount comprisesperforming qPCR, WGA, mass spectroscopy, microarray analysis or DNA orRNA sequencing, loop mediated isothermal amplification, nucleic acidsequence-based amplification, strand displacement amplification,multiple displacement amplification or other isothermal amplificationmethods on nucleic acids of microbes in the sample. In anotherembodiment determining the taxonomic grouping from the DNA sequencescomprises analysis of microbial 16S RNA sequences. In another embodimentdetermining the relative amount comprises: (i) sequencing microbialnucleic acids from the sample to produce DNA sequence information; and(ii) determining from the DNA sequence information microbes belonging tothe single microbial taxonomic grouping and their amounts relative tototal microbes in the sample. In another embodiment the method furthercomprises determining taxonomic groupings of a plurality of othermicrobes in the sample. In another embodiment the determining thequantitative measure of the set of selected microbial genera comprisesdetermining a quantitative measure of the set as a whole. In anotherembodiment determining the quantitative measures comprises: (i)performing qPCR to determine a quantitative measure of the singlemicrobial taxonomic grouping in the sample; and (ii) performing qPCR todetermine a quantitative measure of total microbes in the sample. Inanother embodiment the qPCR comprises use of primers that selectivelyamplify sequences of 16S RNA of the single microbial taxonomic grouping.In another embodiment the qPCR comprises multiplexed qPCR. In anotherembodiment determining the quantitative measures comprises: (i)performing microarray analysis. In another embodiment the state of thehealth condition is represented by a measure of confidence of theprediction. In another embodiment the method comprises executing bycomputer a model produced by a method as disclosed herein. In anotherembodiment the model is the product of a machine learning algorithm.

In another aspect provided herein is a method comprising administeringto a subject predicted, by a method as disclosed herein to have a healthcondition in a certain state, a food, supplement, probiotic or lifestylechange in an amount sufficient to alter or maintain the state of ahealth condition.

In another aspect provided herein is a method comprising: a) predicting,by a method as described herein, a state of a health condition for asubject; and b) communicating over the Internet, to the subject, thestate of the health condition.

In another aspect provided herein is a system comprising: (a) a computercomprising: (i) a processor; (ii) a memory, coupled to the processor,the memory including a quantitative measure of microbes in one or aplurality of selected microbial genera, and a model that, when executed,uses the quantitative measure or measures to predict a state of a healthcondition for the subject; and (iii) computer executable instructionsfor implementing the classification rule on the data. In one embodiment,the system further comprises: (b) a communications interface thatcommunicates with the internet.

In another aspect provided herein is a method comprising: (a) receivingfrom a subject a sample comprising a microbiome of the subject; (b)predicting a health state of the subject by a method as describedherein; (I) optionally developing a wellness intervention plan toaddress a present health state in the subject; and (c) outputting theprediction/intervention plan to an electronic device accessible by thesubject or administering the wellness intervention to the subject. Inone embodiment, predicting comprises measuring nucleic acid sequences ofa microbiome of the subject by: (i) isolating RNA from the sample; (ii)reverse transcribing the RNA into cDNA; (iii) obtaining sequenceinformation from the cDNA by nucleic acid sequencing to produce sequencereads or microarray analysis. In another embodiment predicting comprisesmeasuring nucleic acid sequences of a microbiome of the subject by: (i)isolating DNA from the sample; (ii) amplifying rRNA sequences from theisolated DNA; and (iii) obtaining sequence information from theamplified rRNA sequences by nucleic acid sequencing to produce sequencereads or microarray analysis. In another embodiments the method furthercomprises (iv) mapping the sequence reads to a reference database ofmicrobial nucleotide sequences, such as genomic sequences, and (v)measuring absolute or relative amounts of sequence reads mapping to thedatabase. In another embodiment, predicting comprises measuring nucleicacid sequences of a microbiome of the subject by: (i) amplifying nucleicacids in the sample corresponding to genera associated with the healthcondition (e.g., by fragmenting nucleic acids from the sample and usingprimer pairs that specifically or preferentially amplify nucleic acidshaving nucleotide sequences of the associated genus or genera to amplifysuch sequences from the sample); (ii) amplifying nucleic acids in thesample corresponding to total rRNA; and (iii) determining relativeamounts of the amplified nucleic acids.

DETAILED DESCRIPTION I. Introduction

Provided herein are methods of predicting the state of each of a varietyof health conditions in a subject based on the population of microbes inthe subject's microbiome (e.g., gut, vaginal canal, endometrial cavity,saliva, cell bound or cell containing or other biological fluid derivedmicrobiome). More specifically, it has been found that states of avariety of health conditions are statistically associated with therelative amount of single microbial genera in the microbiome.Taxonomically, the genus level is a taxonomic grouping between familyand species. It is the sixth level in the NCBI taxonomy catalog, thelevels being: (1) kingdom, (2) phylum, (3) class, (4) order, (5) family,(6) genus and (7) species. NCBI taxonomy classifications can be found atthe World Wide Web site ncbi.nlm.nih.gov/taxonomy. Other taxonomydatabases include, for example, the Disbiome database(disbiome.ugent.be), the Microbiome Database(microbiomedb.org/mbio/app/) or the Integrated Microbial Genomes andMicrobiomes database (imgjgi.doe.gov/). Accordingly, by determiningquantitative measures of microbes of a single genus and total microbesin a microbiome sample, one can predict the state of a health conditionbased on the relative amounts.

Statistical or mathematical models are built in which parameters ofhealth conditions are estimated. An individual operator, such as ahealth professional, or a computer, can apply the model to data measuredfrom a test subject to predict the state of the health condition. Modelscan involve numerical values, such as percentages of microbes in asample belonging to a particular genus. The model may predict the stateof the health condition by determining whether a numerical value isabove or below a set cutoff value. For example, a measured value abovethe cutoff value may predict that a particular health condition ispresent in the subject or that there is a particular confidence that thesubject has a health condition. Cutoff values can be set to providedesired levels of various diagnostic measures such as sensitivity,specificity, positive predictive value or negative predictive value.

II. Model Development

Models are developed to predict states of health conditions based ontypes and amounts of microorganisms found in a microbiome (e.g., a gut,vaginal canal, endometrial cavity, saliva, cell bound or cell containingor other biological fluid derived microbiome) of a subject. Databasescan be used as inputs for model building. Databases typically includedata for each of a number of subjects. Data for each subject includes astate of one or a plurality of health conditions for the subject andquantitative measures of each of a plurality of microbes belonging to adesignated taxonomic category.

A. Database

In particular, taxonomic classification at the genus level providesparticularly powerful associations. In certain embodiments the databaseincludes, for each subject, a quantitative measure of microorganismsbelonging to designated genera. Alternatively, the database can include,for each subject, quantitative measures of microorganisms at the specieslevel. Quantitative measures at the genus level can be derived fromthese measures by, for example, combining measures of all speciesbelonging to a particular genus. In other embodiments, microorganismscan be designated to a higher taxonomic level such as, phylum, class,order and family. Again, the database can include quantitative measuresof microorganisms in each category of a taxonomic level, e.g.,microorganisms belonging to particular phyla. Or, quantitative measuresof microorganisms belonging to various categories at a particulartaxonomic level can be derived based on combinations of subdivisions.

Such databases, when used to train learning algorithms, are typicallyreferred to as the training set or a training dataset.

1. Health Conditions

Health conditions inferred by the methods described herein include anyhealth conditions diagnosable by a medical professional qualified tomake the diagnosis, e.g., to enter the diagnosis on a medical record.Typically, the medical professional is a doctor, such as an M.D., a D.O.or a chiropractor. However, under appropriate circumstances the medicalprofessional could be a nurse or dietary health practitioner. As usedherein, the term “medically verified” refers to a medical diagnosis madeby such a medical professional.

Examples of health conditions include, without limitation, underweightBody Mass Index (“BMI”), overweight Body Mass Index, normal Body MassIndex, obese Body Mass Index, normal Body Mass Index with Type IIdiabetes, overweight Body Mass Index with Type II diabetes, obese BodyMass Index with Type II diabetes, normal Body Mass Index with Type Idiabetes, normal Body Mass Index with ulcerative colitis, and unknownBody Mass Index with otherwise presumed healthy. Each of these statescan be part of a binary classification. For example, the classificationscan be overweight BMI or not overweight BMI; normal BMI with type Idiabetes or not normal BMI with type I diabetes. For adult men andwomen, a BMI between 18.5 and 24.9 is considered healthy. Underweight isdefined as a BMI below 18.5. Overweight is defined as a BMI between 25.0and 29.9; and a BMI of 30 or higher is considered obese.

A health condition can be considered to be a categorical variable orfeature in a vector representing subject data. The state of the healthcondition can be represented by a value in the feature cell. This valuecan take any form that differentiates one or more states of the healthcondition. So, for example, the state of the health condition can becharacterized as a binary (e.g., presence or absence), a number (e.g., ameasurement or a score on a scale) or a range (e.g., high, medium orlow). Designation of any state of health condition can implicitlydifferentiate that state from its opposite. So, for example, state“overweight Body Mass Index with Type II diabetes” is distinguished froma state which is not both of these. This includes, for example, notoverweight body mass index and not type II diabetes, overweight bodymass index but not type II diabetes and not overweight body mass indexbut type II diabetes. In other words, an inference of “overweight BodyMass Index with Type II diabetes” is distinguished from these otherstates. In certain embodiments, the health states distinguish betweenpathological and nonpathological states.

A state of a health condition can be expressed in a variety of ways.These include, without limitation, as a binary (e.g., presence(“having”) or absence (“not having” or not”) of the condition), aprobability (e.g., probability that the condition is present or absent),a number or degree (e.g., a measurement or a score on a scale) or arange (e.g., high, medium or low).

2. Taxonomic Information

A database used to develop a model to predict state of the healthcondition based on relative amounts of microbes in the microbiome alsoincludes quantitative measures of a plurality of microbes in themicrobiome. Quantitative measures of microbes at the genus level can bespecified in the database or derivable from it. For example, thedatabase could contain, as features or variables, quantitative measuresof microbes belonging to each genus measured. Alternatively, thedatabase could contain, as features or variables, quantitative measuresof microbes belonging to each species measured. Based on the genus towhich species belong, quantitative measures of microbes belonging toeach genus can be derived. Such taxonomic information can be included inthe database. Accordingly, in certain embodiments, each microorganism isclassified at each of a plurality of taxonomic levels, e.g., kingdom,phylum, class, order, family, genus and species.

In certain embodiments a database includes sequences of nucleic acidspresent in a microbiome sample. Such sequences can be used to determinea taxonomic classification from which the sequences derive. So, forexample, the database may include sequences of microbial 16S RNA. 16SRNA sequences can be used to classify microorganisms to the specieslevel. The American Gut Project database(humanfoodproject.com/americangut/), (github.com/biocore/American-Gut)and links therein includes amounts of 16S RNA sequences found inmicrobiome samples of subjects. Genus and species of microorganisms towhich the sequences correspond can be derived using, e.g., the NCBItaxonomy database or the metadata deposited in The EuropeanBioinformatics Institute.

Next Generation Sequencing (“NGS”) technologies providing reads thattypically target the short high variability regions of the 16S rRNAgene. This approach is enabled because the lengths of the differentvariable regions of the 16S rRNA gene lie in the range of 100-300 bp,readily covered using short paired-end reads produced by commonly usedin many common NGS methods. Longer range reads using nanopore or singlemolecule long read type technology is also envisaged. Earlier technicalapproaches such as Sanger sequencing methodologies can also be used togenerate this information over longer reads, albeit with significantlyless depth of sequencing reads.

The taxonomic classification of 16S rRNA gene sequences is obtainedusing either a homology-based or prediction-based approach. Taxonomicapproaches require the alignment of the target 16S rRNA sequence withthe 16S rRNA sequences present in the reference databases such asRibosomal Database Project (1) Green genes (2) and SILVA (3). Severalhomology-based pipelines to stream line taxonomic analysis of 16S rRNAobtained from metagenomic datasets and include: MEGAN (4), PyNAST (5)and possibly the most commonly used pipeline QIIME (6).

A review of these methodologies with respect to a more recent pipe line“16S Classifier” has recently been described in Chaudhary N, Sharma A K,Agarwal P, Gupta A, Sharma V K, “16S classifier: a tool for fast andaccurate taxonomic classification of 16S rRNA hypervariable regions inmetagenomic datasets”, PLoS One. 2015; 10(2):e0116106. Published 2015Feb. 3. Other classifiers are described in Florian P Breitwieser et al.,“A review of methods and databases for metagenomic classification andassembly,” Briefings in Bioinformatics, Volume 20, Issue 4, July 2019,Pages 1125-1136, doi.org/10.1093/bib/bbx120, Published: 23Sep. 2017.

The dataset comprises data on a plurality of individuals. Data on eachindividual can include the health state of one or a plurality ofdifferent health conditions. The data will also include quantitativemeasures of each of a plurality of microbes from the microbiome of eachsubject. The data preferably at least identifies the microbes at thegenus level. This includes, for example, providing quantitative measuresat the species level, so long as the genus of each species is, or can beidentified. As used herein, the term “genus” refers to a taxonomic rankwhich above species and below family. As used herein, the term“quantitative measure” refers to presence or absence, absolute orrelative amounts or concentrations, absolute or relative increases ordecreases and discrete or continuous ranges (e.g., a number, a degree, alevel, a threshold, a quantile or a bucket). In some embodiments, thequantitative measure can be an absolute value, a ratio, an average, amedian, or a range of numbers.

A measurement of a variable, such as sequencing reads mapping to aposition, can be any combination of numbers and words. A measure can beany scale, including nominal (e.g., name or category), ordinal (e.g.,hierarchical order of categories), interval (distance between members ofan order), ratio (interval compared to a meaningful “0”), or a cardinalnumber measurement that counts the number of things in a set.Measurements of a variable on a nominal scale indicate a name orcategory, e.g., category into which the sequencing read is classified.Measurements of a variable on an ordinal scale produce a ranking, suchas “first”, “second”, “third”. Measurements on a ratio scale include,for example, any measure on a pre-defined scale, absolute number ofreads, normalized or estimated numbers, as well as statisticalmeasurements such as frequency, mean, median, standard deviation, orquantile. Measurements that involve quantification are typicallydetermined at the ratio scale level.

The database can be a public or a private data set. It will typicallyinclude data for at least any of 500, 1000, 2000, 4000, 8000, 12,000,16,000, 20,000, 30,000, 40,000, or 80,000 individuals. Typically, thedatabase will include a plurality of individuals for each of a pluralityof different health states for any health condition. Furthermore, theplurality classified at each state will be sufficient to providestatistically meaningful differentiation of each health state.

Taxonomic groupings identified by number have the following taxonomy:

-   -   35: kingdom Archaea; phylum Euryarchaeota; class        Methanobacteria; order Methanobacteriales; family        Methanobacteriaceae; genus Methanobrevibacter    -   327 kingdom Bacteria; phylum Actinobacteria; class        Actinobacteria; order Bifidobacteriales; family        Bifidobacteriaceae; genus Bifidobacterium    -   421 kingdom Bacteria; phylum Bacteroidetes; class Bacteroidia;        order Bacteroidales; family [Paraprevotellaceae]; genus        [Prevotella]    -   878 kingdom Bacteria; phylum Firmicutes; class Bacilli; order        Lactobacillales; family Lactobacillaceae; genus Lactobacillus    -   929 kingdom Bacteria; phylum Firmicutes; class Clostridia; order        Clostridiales; family Lachnospiraceae; genus Clostridium    -   946 kingdom Bacteria; phylum Firmicutes; class Clostridia; order        Clostridiales; family Lachnospiraceae; genus [Ruminococcus]    -   1050 kingdom Bacteria; phylum Firmicutes; class Erysipelotrichi;        order Erysipelotrichales; family Erysipelotrichaceae; genus        Clostridium    -   935 kingdom Bacteria; phylum Firmicutes; class Clostridia; order        Clostridiales; family Lachnospiraceae; genus Lachnobacterium

It is understood that the taxonomic position of particular genera, ortheir names may be changed by taxonomists over time.

B. Statistical Analysis

The database, once provided, can be analyzed by any statistical method.The analysis will model the association between the variables and thestate of the health condition. This can include estimating parameters ofthe data such as medians, means and standard deviations. The model, onceprepared, can use measurements from test subjects to predict the stateof health condition.

Typically, analysis involves statistical analysis of a sufficientlylarge number of samples to provide statistically meaningful results. Anystatistical method known in the art can be used for this purpose. Suchmethods, or tools, include, without limitation, correlational, Pearsoncorrelation, Spearman correlation, chi-square, comparison of means(e.g., paired T-test, independent T-test, ANOVA) regression analysis(e.g., simple regression, multiple regression, linear regression,non-linear regression, logistic regression, polynomial regression,stepwise regression, ridge regression, lasso regression, elasticnetregression) or non-parametric analysis (e.g., Wilcoxon rank-sum test,Wilcoxon sign-rank test, sign test). Such tools are included incommercially available statistical packages such as MATLAB, JMPStatistical Software and SAS. Such methods produce models or classifierswhich one can use to classify a particular biomarker profile into aparticular state.

Statistical analysis can be operator implemented or implemented bymachine learning.

1. Machine Learning

In certain embodiments statistical analysis is enhanced through the useof machine learning tools. Such tools employ learning algorithms, inwhich the relevant variable or variables are measured in the differentpossible states, and patterns differentiating the states are determinedand used to classify a test subject. Accordingly, any classificationmethod of this disclosure can be developed by comparing measurements ofone or more variables in subjects belonging to the various healthcondition states. This includes, for example, determining a relativeamount of microorganisms belonging to a designated genus to an amount ofall microorganisms in a sample, and using this measure to predict thestate of the condition using a model as described herein.

A variety of machine learning algorithms can be used to infer acondition or state of a subject. Machine learning algorithms may besupervised or unsupervised. Learning algorithms include, for example,artificial neural networks (e.g., back propagation networks),discriminant analyses (e.g., Bayesian classifier or Fischer analysis),support vector machines, decision trees (e.g., recursive partitioningprocesses such as CART—classification and regression trees), randomforests, linear classifiers (e.g., multiple linear regression (MLR),partial least squares (PLS) regression and principal componentsregression (PCR)), hierarchical clustering and cluster analysis.

The learning algorithm is programmed to train on data at the genuslevel. As discussed, the quantitative measures provided in the datasetmay be presented at the genus level, or may be presented at the specieslevel from which genus level information can be derived. For example,quantitative measures of amounts of a plurality of species belonging tothe same genus can be combined to provide a single quantitative measurefor the genus.

C. Model

The end product of statistical analysis is a mathematical model thatpredicts or infers a state of health condition (“health state”) based onquantitative measures of microbes at the genus level. Such mathematicalmodels can also be referred to as diagnostic models or classifiers. Incertain embodiments the model generates a single score, such as anumber. This number can be compared to a cutoff level, e.g., determinedby the learning algorithm, to infer a health state. For example, if thenumber is above a cutoff, the individual a be classified as the healthcondition being present (the state of the health condition is“present”).

A model may rely on a single genus for classification of a healthcondition into a health state, or may use a plurality of genera to doso. For example, the number of genera used by the model may be equal to,at least or no more than any of 1 genus or 2, 3, 4, 5, 6, 7, 8, 9, 10,11 or 12 different genera.

A model can be selected which provides a prediction at a predeterminedlevel of sensitivity, specificity, positive predictive value or negativepredictive value. So, for example, if the classifier employs a cutoff,and a pathological state is inferred if a score is above the cutoff,then, the cutoff level may be raised to increase specificity ordecreased to increase sensitivity.

The sensitivity of a test is the percentage of actual positives thattest positive (TP [true positive]/(TP+FN [false negative]). Thespecificity of a test is the percentage of actual negatives that testnegative (TN [true negative]/(FP [false positive]+TN). The positivepredictive value of a test is the probability that a subject that testspositive is an actual positive (TP/(TP+FP)). The negative predictivevalue of a test is the probability that a subject that tests negative isan actual negative (TN/(FN+TN)). Accordingly, a classifier can beselected that has a sensitivity, specificity, positive predictive valueor negative predictive value of at least any of 75%, 80%, 85%, 90%, 95%,98% or 99%.

The model also can provide a confidence of the prediction.

A model can produce a result that is positively associated with onestate of the health condition and negatively associated with otherstates. For example, a positive association predicts that a particularhealth state is present. A negative association predicts that aparticular health state is not present.

III. Applying A Model To Predict A State Of A Health Condition

Models developed as described herein can be used to predict healthstates of individual subjects. In certain embodiments the methodsinvolve providing a sample comprising microbiota, e.g., gut microbiota,from a subject. For example, the sample can be a fecal sample. Thesample is then processed to quantify the amounts of each of one or aplurality of specified microbial genera in the sample relative to thequantity of total microbes in the sample. A specified microbial genuscan be a genus the quantities of which are associated with the healthstate. The test data generated could include quantitative measures oftotal microbes and each specified microbial genus in the sample.Alternatively, the test data could include quantitative measures of eachof a number of genera not included among the specified genera, as wellas quantitative measures of the specified genera.

A biological sample from a subject can comprise, for example, humanexcreta (e.g., feces (stool) or urine), blood, serum, plasma, saliva, orthroat swab.

A. Determining Quantitative Amounts of Microbes in a Sample

Quantifying specific genera in a sample can include targeting thespecified genera for amplification, e.g., by PCR. Specific genera can betargeted for amplification, for example, by using genus-specific 16S RNAamplification primers. Alternatively, a collection of species-specificprimers for the designated genus can be used. Quantification can beperformed, for example, by qPCR. In certain embodiments each specifiedgenus can be quantified separately. In other embodiments, the collectionof specified genera can be quantified together in a multiplexed processsuch as multiplexed qPCR.

1. DNA Sequencing

Nucleic acids to be sequenced can include both DNA and RNA. Sequencingnucleic acids typically involves converting raw nucleic acids into aform compatible with a high throughput sequencer. This may include, inthe case of RNA, reverse transcribing RNA into DNA. It may also includeproviding adapter molecules on DNA molecules adapted for function withparticular DNA sequencer.

Nucleic acids can be sequenced by any methods known in the art toproduce sequence reads comprising nucleotide sequences. Typically,nucleic acid sequencing is performed by high throughput sequencing. Highthroughput sequencing refers to the simultaneous or near simultaneoussequencing of thousands of nucleic acid molecules. High throughputsequencing is sometimes referred to as “next generation sequencing”(“NGS”) or “massively parallel sequencing”. Platforms for highthroughput sequencing can be any suitable platform and include, withoutlimitation, massively parallel signature sequencing (MPSS), Polonysequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiDsequencing, Ion Torrent semiconductor sequencing, DNA nanoballsequencing, Heliscope single molecule sequencing, single molecule realtime (SMRT) sequencing (PacBio), and nanopore DNA sequencing (e.g.,Oxford Nanopore). Use of barcodes, such as with 10× Genomics(Pleasanton, Calif.) sequencing methods also can be used. Raw sequencereads are typically subject to bioinformatic analysis to transform thedata into a format more useful for study. For example, sequence readsmay be quantified to determine absolute or relative numbers of moleculeshaving the same nucleotide sequence or having been derived from the samegene or the same area of the genome. Sequence data can be furtheranalyzed, for example, to determine quantitative measures (e.g.,absolute or relative amounts) of microorganisms in specific categoriesat different taxonomic levels. This includes, for example,identification of different phyla, orders, classes, families, generaand/or species of microorganism.

Accordingly, quantitative information about specific sequences can beused to determine quantitative measures of microbes belonging to each ofa plurality of groupings within a taxonomic level, for examplequantitative measures of microbes belonging to different genera.

2. qPCR

In another embodiment, microbes belonging to specific taxonomic groups,e.g., genera, can be determined by qPCR of nucleic acid acids havingsequences characteristic of the taxonomic group. Quantitative PCR (qPCR)(also referred to as real time PCR or RT-PCR) involve simplification ofnucleic acids using amplification primers selected to amplify nucleicacids having predetermined sequences. Quantitative PCR also uses adetectable marker such as a fluorophore to measure production ofamplified product. After each round of PCR, the amount of marker isdetected. The original amount of particular nucleic acid sequences in asample is a function of the number of PCR rounds required to achieve acertain signal strength. Detection can be performed by nonspecificdetection. This can involve using a dye, such as SYBR green, that bindsto double-stranded DNA. Alternatively, detection can be performed byspecific detection. This can involve, using detectably tagged reporterprobes that only bind to sequences of interest.

In the present methods PCR primers that specifically amplify sequencesof specified genera can be used. Examples of these primers and sequencescan be found using the high-throughput real-time quantitative PCR-based‘GUt Low-Density Array’ (GULDA) primer set. See FEMS MicrobiologyLetters, Volume 337, Issue 1, 1 Dec. 2012, Pages 38-47,https://doi.org/10.1111/1574-6968.12004 and references therein, denotingimportant phyla, genera, species, or other taxonomic groups within thefive predominant bacterial phyla of the gut. Other assays sets can bedesigned by practitioners versed in the PCR design optimization.

3. Microarray Analysis

In another embodiment, microbes can be quantified by microarrayanalysis. Microarrays are solid supports having, at specific addressablelocations, probes that hybridize to specific nucleotide sequences.Nucleic acids, typically bearing a detectable label, from the samplesare contacted with the microarray and allowed to hybridize with theprobes. The strength of the signal at each addressable location is afunction of the amount of nucleic acid in the sample. An operator candetermine one or a plurality of probes in the microarray that hybridizeto sequences of specific taxonomic groups. Based on signal provided bythe microarray, amounts of nucleic acids derived from specific taxonomicgroups can be determined. A human gut chip “HuGChip” for determiningmicrobiome diversity at family level has been described PLoS One. 2013May 17; 8(5):e62544. doi: 10.1371/journal.pone.0062544. Print 2013.Custom microbiota microarray based on the Affymetrix GeneChip platformand the PhyloChip assay (secondgenome.com/solutions/services/phylochip/)have also been generated at world wide web siteaem.asm.org/content/75/11/3572. Axiom Microbiome arrays are alsoavailable from Thermo Fisher at world wide web siteaffymetrix.com/support/technical/byproduct.affx?product=axiom_microbiome_array.

4. Exemplary Methods

In one exemplary protocol, a stool sample from a subject is provided.Microbial cells are isolated from the sample. Total RNA is isolated fromthe sample by, for example, guanidinium isothiocyanate precipitation.RNA is reverse transcribed into cDNA. The preponderance of cellular RNAis rRNA and the removal of other RNA types is optional. cDNA can beanalyzed by, DNA sequencing, qPCR or by microarray analysis. Microarraysfor analyzing rRNA sequences are described in the literature. See, e.g.,Alexander, L. et al., “Oligonucleotide Microarray for 16S rRNAGene-Based Detection of All Recognized Lineages of Sulfate-ReducingProkaryotes in the Environment” Appl Environ Microbiol. 2002 October;68(10): 5064-5081, doi: 10.1128/AEM.68.10.5064-5081.2002, PMCID:PMC126405, PMID: 12324358.

In another exemplary protocol, a stool sample from a subject isprovided. Microbial cells are isolated from the sample. Total DNA isisolated from the sample by standard methods. Sequences corresponding torRNA can be amplified from the genomic DNA by, for example, universalrRNA primers. The amplified DNA can be processed for high throughputsequencing, qPCR or by microarray analysis.

In other exemplary embodiments, parallel amplification processes areperformed. A first amplification amplifies all rRNA sequences in thesample. A second amplification process specifically amplifies rRNAsequences from diagnostic genera. The amounts of both can be compared todetermine a relative amount of diagnostic sequences to total rRNAsequences.

B. Tests

The American Gut Project database, a public data base of fecal 16S RNAsequence data (>14,000 patients) with varying disease states, wasexamined. Data from patients whose information had not verified by amedical practitioner, or sets of patients where the cohort sizequestionnaire responses are too small to draw conclusions from, wereexcluded. The vast majority of 16S diversity between patients iscaptured by relatively few Genera, i.e., ˜97.8% of samples have at least50% of unique hits in <=5 Genera and 99.7% in <=10 hits. See, e.g. FIG.1.

After grouping patients into 10 different health categories, thefiltered data set was subjected to statistical analysis comparinginter-health categories via a least square means analysis of variancemethodology.

-   -   1. Microbial Taxonomic Grouping is “929” is Positively        Associated with Overweight Body Mass Index with Type II Diabetes

Microbial taxonomic grouping “929” is associated with a positivediagnosis of overweight body mass index with type II diabetes. Forexample, a relative amount of grouping “929” to total microbes above acutoff predicts this health state. Put another way, where the percent oftaxonomic grouping “929” microbes in a gut microbiota sample is abovethe cutoff, one can predict the presence of overweight body mass Indexwith type II diabetes.

-   -   2. Microbial Taxonomic Grouping is “878” is Positively        Associated with Overweight Body Mass Index with Type II Diabetes

Microbial taxonomic grouping “878” is associated with a positivediagnosis of overweight body mass index with type II diabetes. Forexample, a relative amount of grouping “878” to total microbes above acutoff predicts this health state. Put another way, where the percent oftaxonomic grouping “878” microbes in a gut microbiota sample is abovethe cutoff, one can predict the presence of overweight body mass Indexwith type II diabetes.

-   -   3. Microbial Taxonomic Grouping is “327” is Positively        Associated with Normal Body Mass Index with Ulcerative Colitis

Microbial taxonomic grouping “327” is associated with a positivediagnosis of normal body mass Index with ulcerative colitis. Forexample, a relative amount of grouping “327” to total microbes above acutoff predicts this health state. Put another way, where the percent oftaxonomic grouping “327” microbes in a gut microbiota sample is abovethe cutoff, one can predict the presence of normal body mass Index withulcerative colitis.

IV. Wellness and Therapeutic Interventions

The associations described herein between relative amounts of specifiedmicrobial genera and health states is useful in the prescription,administration and monitoring of interventions to alter or maintainthese states. Such interventions include, without limitation,administration of a food, supplement or probiotic, or lifestyle change,any of which can be provided in an amount or degree sufficient to alteror maintain the state of a health condition. Such interventions mayimprove wellness of a subject and/or may produce a therapeutic effect. Awellness intervention (including therapeutic intervention) protocoldesigned to address a particular health condition can be referred to asa wellness intervention plan. Such a plan could include recommendationsof exercise, foods, supplements, probiotics, or pharmaceuticals toinclude in a lifestyle or diet. It can further amounts, timing and/orduration of such interventions.

As used herein, the term “wellness intervention” refers to anintervention that improves wellness of a subject. Wellness can bedefined objectively and subjectively.

As used herein, the terms “therapeutic intervention”, “therapy” and“treatment” refer to an intervention that produces a therapeutic effect,(e.g., is “therapeutically effective”). Therapeutically effectiveinterventions prevent, slow the progression of, delay the onset ofsymptoms of, improve the condition of (e.g., causes remission of),improve symptoms of, or cure a health condition. A therapeuticintervention can include, for example, administration of a treatment,administration of a pharmaceutical, or a biologic or nutraceuticalsubstance with therapeutic intent. The response to a therapeuticintervention can be complete or partial. In some aspects, the severityof disease is reduced by at least 10%, as compared, e.g., to theindividual before administration or to a control individual notundergoing treatment. In some aspects the severity of disease is reducedby at least 25%, 50%, 75%, 80%, or 90%, or in some cases, no longerdetectable using standard diagnostic techniques. Recognizing thatcertain sub-groups of subjects may not respond to a therapy, one measureof therapeutic effectiveness can be effectiveness for at least 90% ofsubjects undergoing the intervention over at least 100 subjects.

As used herein, the term “effective” as modifying a therapeuticintervention (“effective treatment” or “treatment effective to”) oramount of a pharmaceutical drug (“effective amount”), refers to thattreatment or amount to ameliorate a disorder, as described above. Forexample, for the given parameter, a therapeutically effective amountwill show an increase or decrease in the parameter of at least 5%, 10%,15%, 20%, 25%, 40%, 50%, 60%, 75%, 80%, 90%, or at least 100%.Therapeutic efficacy can also be expressed as “-fold” increase ordecrease. For example, a therapeutically effective amount can have atleast a 1.2-fold, 1.5-fold, 2-fold, 5-fold, or more effect over acontrol.

Thus, according to some methods a subject is first tested for a gutmicro biomarker of health condition in a biological sample from thesubject. A prediction of the state for a health condition or conditionsis determined based on the results. Based on the prediction a decisioncan be made regarding the type, amount, route and timing ofadministering an optimally effective therapeutic intervention to thesubject.

V. Computer Systems:

Provided herein are computer systems for carrying out processesdescribed herein. Computer systems can include a processor and memoryaccessible by the processor. For example, the processor can be a centralprocessing unit (CPU). Memory can be in tangible form, for example, readonly memory or random-access memory. Memory can include machineexecutable code (e.g., software) that, when executed by the processor,carries out instructions in the code. The software can be, for example,a model as described herein.

The computer system can also include a communication interface (e.g.,network adapter) for communicating with one or more other systems, andperipheral devices. The computer system can be in communication with acomputer network, such as a local area network or the internet. Thesystem can be in communication with the internet through, for example, ahigh-speed transmission network including, without limitation, DigitalSubscriber Line (DSL), Cable Modem, Fiber, Wireless, Satellite and,Broadband over Powerlines (BPL).

The computer system can communicate with one or more remote computersystems through the network. Examples of remote computer systems includepersonal computers, tablets, smart phone, etc. Communication can be byway of a user interface, such as a graphical user interface.

Accordingly, after making a prediction of a health state as describedherein, the prediction and/or a wellness intervention plan to addressthe condition can be communicated to a user through a communicationsnetwork such as the internet, where it can be accessed by user, forexample, in password-protected format. So, for example, a method caninclude: Receiving from a subject a sample comprising a microbiome ofthe subject; predicting a health state of the subject by a method asdescribed herein; and outputting the prediction/intervention to anelectronic device accessible by the subject. The electronic device caninclude, for example, a personal computer, a tablet, a smart phone, or awearable device. More specifically, the method of predicting can includedetermining nucleic acid sequences of a microbiome of the subject by,for example, (i) amplifying nucleic acids in the sample corresponding togenera associated with the health condition (e.g., by fragmentingnucleic acids from the sample and using primer pairs that specificallyor preferentially amplify nucleic acids having nucleotide sequences ofthe associated genus or genera to amplify such sequences from thesample); (ii) sequencing the amplified nucleic acids to produce sequencereads; (iii) mapping the sequence reads to a reference database ofmicrobial nucleotide sequences, such as genomic sequences, and (iv)measuring absolute or relative amounts of sequence reads mapping to thedatabase. Alternatively, after predicting the state of the healthcondition, a wellness intervention in line with a wellness plan toameliorate the health condition can be administered to the subject.

REFERENCES

Cole J R, Wang Q, Cardenas E, Fish J, Chai B, et al. (2009) TheRibosomal Database Project: improved alignments and new tools for rRNAanalysis. Nucleic acids research 37: D141-D145. 10.1093/nar/gkn879

DeSantis T Z, Hugenholtz P, Larsen N, Rojas M, Brodie E L, et al. (2006)Greengenes, a chimera-checked 16S rRNA gene database and workbenchcompatible with ARB. Applied and environmental microbiology 72:5069-5072. 10.1128/AEM.03006-05

Pruesse E, Quast C, Knittel K, Fuchs B M, Ludwig W, et al. (2007) SILVA:a comprehensive online resource for quality checked and alignedribosomal RNA sequence data compatible with ARB. Nucleic acids research35: 7188-7196. 10.1093/nar/gkm864

Mitra S, Stärk M, Huson D H (2011) Analysis of 16S rRNA environmentalsequences using MEGAN. BMC genomics 12: S17 10.1186/1471-2164-12-S3-S17[PMC free article] [PubMed] [CrossRef]

Caporaso J G, Bittinger K, Bushman F D, DeSantis T Z, Andersen G L, etal. (2010) PyNAST: a flexible tool for aligning sequences to a templatealignment. Bioinformatics 26: 266-267. 10.1093/bioinformatics/btp636

Caporaso J G, Kuczynski J, Stombaugh J, Bittinger K, Bushman F D, et al.(2010) QIIME allows analysis of high-throughput community sequencingdata. Nature methods 7: 335-336. 10.1038/nmeth.f.303

Chaudhary N, Sharma A K, Agarwal P, Gupta A, Sharma V K. 16S classifier:a tool for fast and accurate taxonomic classification of 16S rRNAhypervariable regions in metagenomic datasets. PLoS One. 2015;10(2):e0116106. Published 2015 Feb. 3.

As used herein, the following meanings apply unless otherwise specified.The word “may” is used in a permissive sense (i.e., meaning having thepotential to), rather than the mandatory sense (i.e., meaning must). Thewords “include”, “including”, and “includes” and the like meanincluding, but not limited to. The singular forms “a,” “an,” and “the”include plural referents. Thus, for example, reference to “an element”includes a combination of two or more elements, notwithstanding use ofother terms and phrases for one or more elements, such as “one or more.”The term “or” is, unless indicated otherwise, non-exclusive, i.e.,encompassing both “and” and “or.” The term “any of” between a modifierand a sequence means that the modifier modifies each member of thesequence. So, for example, the phrase “at least any of 1, 2 or 3” means“at least 1, at least 2 or at least 3”. The term “consisting essentiallyof” refers to the inclusion of recited elements and other elements thatdo not materially affect the basic and novel characteristics of aclaimed combination.

It should be understood that the description and the drawings are notintended to limit the invention to the particular form disclosed, but tothe contrary, the intention is to cover all modifications, equivalents,and alternatives falling within the spirit and scope of the presentinvention as defined by the appended claims. Further modifications andalternative embodiments of various aspects of the invention will beapparent to those skilled in the art in view of this description.Accordingly, this description and the drawings are to be construed asillustrative only and are for the purpose of teaching those skilled inthe art the general manner of carrying out the invention. It is to beunderstood that the forms of the invention shown and described hereinare to be taken as examples of embodiments. Elements and materials maybe substituted for those illustrated and described herein, parts andprocesses may be reversed or omitted, and certain features of theinvention may be utilized independently, all as would be apparent to oneskilled in the art after having the benefit of this description of theinvention. Changes may be made in the elements described herein withoutdeparting from the spirit and scope of the invention as described in thefollowing claims. Headings used herein are for organizational purposesonly and are not meant to be used to limit the scope of the description.

1. A method comprising: a) predicting a state of a health condition in asubject by: i) providing a microbiome sample (e.g., a gut, vaginalcanal, endometrial cavity, saliva, cell bound or cell containing orother biological fluid derived microbiome sample from a subject; ii)determining, from the sample, a quantitative measure of microbes in oneor a plurality of selected microbial genera; and iii) applying a modelto the quantitative measure to predict a state of the health condition;and b) recommending or administering to the subject an interventionselected from a food, a supplement, a probiotic, a lifestyle change orpharmaceutical in an amount or degree sufficient to alter or maintainthe state of the health condition.
 2. The method of claim 1, wherein thestate of the health condition is given as a score, and altering thestate of the health conditions comprises altering the score.
 3. Themethod of claim 1, further comprising monitoring the state of the healthcondition in the subject over a period of time by repeating operations(i), (ii) and (ii) one or a plurality of times.
 4. The method of claim1, wherein the health condition is overweight Body Mass Index with TypeII diabetes, and the selected microbial genus is taxonomic grouping“929”.
 5. The method of claim 4, wherein the quantitative measure is arelative amount of microbes belonging to taxonomic grouping “929” tototal microbes.
 6. The method of claim 5, wherein the model predictsthat overweight Body Mass Index with Type II diabetes is present whenthe relative amount is above a threshold percent.
 7. The method of claim1, wherein the health condition is normal BMI with ulcerative colitis,and the selected microbial genus is taxonomic grouping “327”.
 8. Themethod of claim 7, wherein the quantitative measure is a relative amountof microbes belonging to taxonomic grouping “327” to total microbes. 9.The method of claim 8, wherein the model predicts that normal BMI withulcerative colitis is present when the relative amount is above athreshold percent.
 10. The method of claim 1, wherein the healthcondition is overweight Body Mass Index with Type II diabetes, and theselected microbial genus is taxonomic grouping “878”.
 11. The method ofclaim 10, wherein the quantitative measure is a relative amount ofmicrobes belonging to taxonomic grouping “878” to total microbes. 12.The method of claim 1, further comprising altering the intervention overthe period of time to maintain the state of the health condition oralter the state of the health condition to a desired state.
 13. A methodcomprising: a) accessing by computer, a dataset comprising, for each ofa plurality of subjects, (1) a medically verified state of one or aplurality of health conditions, and (2) quantitative measures of amountsof microbes belonging to one or a plurality of selected microbial generain a microbiome of the subject; and b) performing statistical analysison the dataset to develop a model that predicts the state of the healthcondition in a subject based on the quantitative measures. 13-66.(canceled)
 67. A system comprising: (a) a computer comprising: (i) aprocessor; (ii) a memory, coupled to the processor, the memory includinga quantitative measure of microbes in one or a plurality of selectedmicrobial genera, and a model that, when executed, uses the quantitativemeasure or measures to predict a state of a health condition for thesubject; and (iii) computer executable instructions for implementing theclassification rule on the data. 68-73. (canceled)